Introduction to Machine Learning via Tidymodels
Surprise
You have already done machine learning in the class
\(~\)
Linear regression from the lake ice model
\(~\)
fit <- lm(doy ~ year, data = sunapee)
Prediction vs. understanding
- Modeling for understanding
- Parameters and model matter
- e.g., is the parameter that relates temperature to growth different from zero?
- Modeling for prediction
- Quality of prediction matters
- Machine learning!
Machine learning
Machine learning is a field of inquiry devoted to understanding and building methods that “learn” – that is, methods that leverage data to improve performance on some set of tasks. - Wikipedia
Broad classes of ML
Supervised
- Examples:
- Images are labeled by a human (criminal vs. non-criminal)
- Values are provided for a continuous variable (i.e., stream nitrate)
Unsupervised
- Examples:
- Recommendation systems - if you like a movie, ML can find movies like it to recommend
- What are the characteristics of different groups that buy a product?
ML Flow chart
![]()
Big picture steps in ML
- Define question (what are you predicting…is it ethical?)
- Obtain data
- Define the type of ML (supervised vs. unsupervised, regression vs. classification).
- Identify method that will be used (this influences how data will be pre-processed)
- Pre-process data (also called feature engineering)
- Define specific approach for applying the model (i.e., which R package)
Big picture steps in ML
- Define data splits (training vs. testing)
- Fit (train) model to training data
- Evaluate (validate) the model with testing data (need to evaluate with testing data because ML learns the training data)
Big picture steps in ML
- Deploy model (predict new data).
Lake Ice module as ML
- Define question: what is the lake ice in 2030?
- Obtain data: read_excel
- Define the type of ML: supervised regression
- Identify method that will be used: linear regression
- Pre-process data: I already converted the date to DOY
- Define specific approach for applying the model:
lm
Lake Ice module as ML
- Define data splits: we didn’t do this
- Fit (train) model to training data:
lm(doy ~ year, data = sunapee)
- Evaluate (validate) the model with testing data: we didn’t do this
Lake Ice module as ML
- Deploy model (predict new data): predicted for year 2030
Tidyverse take on ML
Tidymodels (meta-package)
Tidymodels
The tidymodels framework is a collection of packages for modeling and machine learning using tidyverse principles.
\(~\)
Whether you are just starting out today or have years of experience with modeling, tidymodels offers a consistent, flexible framework for your work.
Tidymodels

Overview of module
Datasets
- Forest carbon from NEON (from prior module)
- Lake Ice (from prior module)
Plan
- Instruction in tidymodels applied to predicting carbon
- Assignment part 1: apply to predicting lake ice
- Instruction in tuning ML models
- Assignment part 2: apply predicting to forest carbon
Focal dataset
Predicting the mean vegetation carbon stocks for each plot in the NEON data
Columns
[1] "plotID" "daylength" "precip" "range" "tavg"
[6] "solar" "tmax" "tmin" "vpd" "elevation"
[11] "nlcdClass" "lat" "long" "plotType" "ndvi"
[16] "siteID" "plot_kgCm2"
Number of rows
Challenge
Predict out-of-sample data
- You have the “predictors” for 48 additional sites
- I have the carbon stocks
- You will submit your predictions of these 48 via Canvas
- I will summarize and compare the results.